Regression Analysis involves curve fitting.
Curve fitting: The process of finding a relation or equation of best fit.
\[Y = f(x_1, x_2, x_3) + \epsilon\]
Goal: Estimate \(f\) ?
estimate \(f\) using observed data without making explicit assumptions about the functional form of \(f\).
estimate \(f\) using observed data by making assumptions about the functional form of \(f\).
Ex: \(Y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + \epsilon\)
background-image: url(‘regressionpaper2.png’) background-position: center background-size: contain
Measures the strength of the linear relationship between two quantitative variables.
returns a value of between -1 and +1. A -1 means there is a strong negative correlation and +1 means that there is a strong positive correlation.
Does not completely characterize their relationship.
is very sensitive to outliers.
\[ \sigma^2 = \frac{\sum_{i=1}^N (x_i-\mu_x)^2}{N} \]
\[ \sigma = \sqrt\frac{\sum_{i=1}^N (x_i-\mu_x)^2}{N} \]
\[ cov(x,y) = \frac{\sum_{i=1}^N (x_i-\mu_x)(y_i-\mu_y)}{N} \]
Your turn: Create a geometrical demonstration
Response variable: dependent variable
Explanatory variables: independent variables, predictors, regressor variables, features (in Machine Learning)
Simple - single regressor
Linear - parameters enter in a linear fashion.
\[Y = \beta_0 + \beta_1x_1 + \beta_{2}x_2 + \epsilon\]
\[Y = \beta_0 + \beta_1x + \beta_{2}x^2 + \epsilon\]
\[Y = \beta_0e^{\beta_1x} + \epsilon\]
What about this?
\[Y = \alpha X_1^\beta X_2^\gamma X_3^\delta e^\epsilon\]
True relationship between X and Y in the population
\[Y = f(X) + \epsilon\]
If \(f\) is approximated by a linear function
\[Y = \beta_0 + \beta_1X + \epsilon\]
The error terms are normally distributed with mean \(0\) and variance \(\sigma^2\). Then the mean response, \(Y\), at any value of the \(X\) is
\[E(Y|X=x_i) = E(\beta_0 + \beta_1x_i + \epsilon)=\beta_0+\beta_1x_i\]
For a single unit \((y_i, x_i)\)
\[y_i = \beta_0 + \beta_1x_i+\epsilon_i \text{ where } \epsilon_i \sim N(0, \sigma^2)\]
We use sample values \((y_i, x_i)\) where \(i=1, 2, ...n\) to estimate \(\beta_0\) and \(\beta_1\).
The fitted regression model is
\[\hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1x_i\]
\[E(Y|X=x_i) = \beta_0+\beta_1x_i\]
Linear Regression
Quantile Regression
Piece-wise (Segmented) Regression
LOESS (Locally Estimated Scatterplot Smoothing)
Hodrick-Prescott (HP) Filter
Multivariate Regression